CU Amiga Super CD-ROM 6

home *** CD-ROM | disk | FTP | other *** search

/ CU Amiga Super CD-ROM 6 / CU Amiga Magazine's Super CD-ROM 06 (1996)(EMAP Images)(GB)(Track 1 of 4)[!][issue 1997-01].iso / cucd / online / fidonetts / fsc-0084.001 < prev next >

Wrap

Text File | 1995-09-03 | 66KB | 1,487 lines

| Document: FSC-0084 | Version: 001 | Date: 03 September 1995 | | Denis Bider, FidoNet#2:380/129.0 /* Document: Electronic Data Exchange standard level 1 File: EDX1.TXT Purpose: a straight-forward data exchange standard with space to expand Author: denis bider, ofs->FidoNet#2:380/129.0 Copyright (C) 1994-1995 by denis bider. See DISCLAIM.TXT. Send *any* comments to one of my addresses as listed above. ======================================================================== Introduction ======================================================================== After a year of development and all sorts of improvements, EDX finally achieved the state where it has nearly everything currently wanted from a mail format. And finally, it is being released into the general public. My opinion is that it was well worth the waiting; anyway, this is up to you to decide. EDX is meant as a standard for electronic cumputer networks that exchange messages, files and similar data. What it does is to redesign all the existing chaos from the beginning and try not to do the same mistakes other similar standards did. It does its own work, others do their. It is not necessary that EDX is better than other such standards. It might also be the worst of all. This document will try to convince you about neither. It will simply describe the standard from the beginning to the end. Due to my relatively poor English, I may not succeed in the "easy to understand" part, but well, you'll just have to get along with it. Please mail me all comments you might have. ====================================================================== Notes, definitions ====================================================================== Null: ASCII 0 CR: Carriage Return (Enter) - ASCII 13 a long: a 32-bit (4-byte) signed value. an int: a 16-bit (2-byte) signed value. a char(acter): an 8-bit (1-byte) value. a ulong: an unsigned long. a uint: an unsigned int. A subfield: a various-length data field most commonly used in other data fields. Consists of a subfield ID (an uint), a subfield data length ("datlen") identifier (an ulong) and <datlen> bytes of data. Ie: ulong datlen ulong ID char data[datlen] 0x<value> value in hexadecimal (base 16). Lowercase: When a string or character is said to be "lowercase", that means that any characters between and including ASCII 'A'..'Z' are represented as their 'a'..'z' counterpart. Conversion applies to *no other characters in any national alphabets*. * All mentioned CRCs are, as in Zmodem, 0xffffffff based * All multi-byte items (words, longs) mentioned are expressed in Intel format, which means least significant bytes (LSB) being presented first. (Eg, 0xff11 should be presented as 0x11 0xff) ====================================================================== Views ====================================================================== The network ============= My opinion is that the most basic set of layers to which all computer network technologies can be divided to contains the following: 1: Physical point-to-point connection layer 2: Physical network layer 3: Logical point-to-point connection layer 4: Logical network layer Let's explain that on the example of Fidonet, a typical over-the-phone network technology. In this case, the physical point-to-point connections are telephone wires; the physical network is all those point-to-point connections combined; the logical point-to-point connections are modem dial-up connections; and the logical network are, roughly, all those point-to-point connections combined. The similar applies for, say, Internet telnet feature: the physical point-to-point connections are the low-level connections between Internet-connected computers, the physical network are all these combined, and the logical connection is the telnet feature itself. There is, of course, no logical network layer. And similarly for a connection to a local BBS. EDX is a standard that defines the fourth, logical network layer. A "Recommendations" chapter is provided in which a sample interaction between the fourth and the third network layer is defined; however, that chapter should not be treated as a part of EDX itself. The site ========== In everyday practice, I encounter many inconsistencies in how systems are generally treated. Often, one says "BBS" meaning "mail system", or meaning the entire site at all. So let's define these terms. 1. The site is all the hardware, software and peopleware, and is often referred to as "system". 2. The mail system is the part of the site that deals with networks, with "external relations". If you're in an OFT network and run SomeScan in combination with OtherMail, these two programs are your mail system from the viewpoint of the network you're in. 3. The BBS is the part of the site that deals with human callers, and has nothing to do with the part of the site called the "mail system", except that the parts can and usually do exchange data (messages, files). My opinion is that the mail system and the BBS part of a site should be kept separated, but often that is not the case. Take QWK networks for example, where not only the two concepts are totally mixed up, but networks also not so rarely mess with things that are none of their bussiness; a network as an organization should care about the systems, not about the BBSes or even the entire sites, but that is the mistake often done. The points ============ In networks like FidoNet, a user often installs mailing software and becomes what is called "a point". A point system is, in EDX, treated as any other system. Indeed, actually *every* system is a point system, it's only that those systems that are talked about as "nodes" have a point number of zero. See below for a disclaimer in which you will read that in EDX, if OFT addresses are used, all fields must be present, zero or not. Therefore, when an application receives or sends mail from/to a point, the "point" system must be treated as any other system. In EDX terms, points are full-fledged systems and that is exactly how they must be treated; they are included in SENTTO and TRACE subfields, as well. The limitations of a point being able to be linked to a single system (ie, what was in former organization called "a boss") is gone and buried; as said, EDX does not distinguish point systems from any other type of systems. Any differences in point-system-treatment in the other parts of a network do not affect how EDX treates them. ======================================================================== Addresses in EDX ======================================================================== EDX uses E-Addressing for maximum compatibility with various addressing systems and to allow independability from the addressing scheme as used by the underlying network. However, only and exclusively site E-Addresses are used in EDX; usage of a user E-Address in any field of an EDX message is considered a violation of the specifications. The general format of a site E-Addresses is: <format> "->" <siteaddr> <format> specifies the format of the <siteaddr> field. An E-Address is assumed not to contain any whitespace. E-Addresses can or cannot be case sensitive, depending on the contents of the <format> field; for that matter, when passing E-Addresses, the its case should be left untouched. For now, all known types of E-Addresses are case INsensitive. The following formats are recognized: Format identifier: "ofs" (Traditional FTN style) <siteaddr> format: <netid> "#" <zone> ":" <net> "/" <node> "." <point> Example addresses: ofs->FidoNet#2:380/129.0 ALL ADDRESS COMPONENTS ARE REQUIRED. NO EXCEPTIONS. Format identifier: "itn" (Internet e-mail style) <siteaddr> format: <sth> {"." <sth>} Example addresses: itn->f129.n380.z2.fidonet.org itn->ixtas.fer.uni-lj.si All format identifiers are and will be three characters in length. ======================================================================== The logical network layer ======================================================================== This chapter describes the logical network layer that is independent of the lower layers. One of the ways how to actually pass what is defined in this chapter from one system to another is described in the Recommendations chapter. The reason for such separation is that EDX is a layer 4 protocol definition exclusively, and does not want to mix with other network layers; ie., a network must by itself choose or define the layer 3, 2 and 1 protocols it is going to use with EDX. However, in order to standardize EDX-related matters, a chapter with some recommendations is provided towards the end of the document. The idea of the mentioned independent part of the logical network layer is similar to the way in which messages are stored in the JAM message base format; each message consists of a binary header for fixed-length data and an arbitrary number of subfields that contain other, variable- length data. An EDX subfield consists of, as lined out in the Notes section, a datlen identifier, an ID and data. Subfields with an unknown ID should be left untouched when exported to other systems. ======================================================================== The message ======================================================================== EDX messages differ a little from other network types' messages: in EDX, messages need not consist of text only, or of text at all; a message can have more than one receiver. True crossposting and other goodies ======================================================================== For quite a while at first, true crossposting (a single physical message belonging to more than one echo) was a part of the EDX specifications. However, it is my opinion that, in the current state of things, it would cause much more problems than it would solve, so this "feature" has been removed. Formerly present, but removed for the same reason have been Utypia-style ROUTE directions. Message header ======================================================================== The binary message header layout follows: char signature[8] // Must match <E><D><X><_><M><S><G><NULL> uint hdrlen // The size of the header int utcoffset // UTC offset, *signed*; see timestamp ulong timestamp // Local time of message's creation ulong subflen // Length of the subfields that follow ulong attribute1 // Message attributes ulong seqno // Message's sequential number hdrlen specifies the size of the header, from and including the first byte of the signature field to and including the last byte of the last present field. Used mainly to ensure downward compatibility for hypothetical EDX levels higher than 1. Should an application encounter hdrlen higher than it supports, it should only process fields up to what it supports and skip the others. Should it encounter hdrlen lower than it supports, it should only process fields up to <hdrlen> bytes. Note that the hdrlen field cannot be just arbitrarily picked! When creating a header, always include the whole contents of the highest header revision you support; otherwise, it is perfectly allright for a processing application to dismiss the message in its entirety. timestamp contains the local date and time when the message has been written, or if that information isn't available, when it joined network flow. It is expressed as the number of seconds elapsed since 00:00:00, January 1st 1970; the time should be (= must be) represented in UTC. The UTC offset of the site that generated timestamp as described above is stored in the utcoffset field. Eg: if the UTC offset is -0230, the utcoffset field should read, simply, -230; +0200 => 200; and so forth. The seqno field is the message's sequential number. For each area an EDX system is linked to, it maintains the number of messages it exported from that area. When the next message is exported, that number is incremented by 1 and is also assigned to the message as its serial number. The main use of this serial number is that one can quickly see if they received all the messages from a particular system in a particular area, and if they didn't, messages are getting lost somewhere. This serial number might also be used as means of dupe-link detection, but however, if the serial numbers of two messages don't match, one of them can still be a dupe of the other; the system might have exported the message twice. Therefore, you should stick to the msgid header field for duplicate message checking; the serial numbers of duplicate messages can be used to determine the cause of duplication. Message attributes ======================================================================== The following bits for attribute1 are defined: HasFiles 0x01L The message has files attached IsReply 0x02L The message is a reply ReceiptRq 0x04L (netmail messages only) A return receipt should be generated for the message when it is received by the destination system. ConfirmRq 0x08L (netmail only) A return receipt should be generated for the message when it is read by each of its addressees. IsReceipt 0x10L (netmail only) The message is a return receipt. Echoed 0x20L If set, the message contains an ECHO subfield. If not set, the message contains a DEST subfield. Other bits should be set to 0. IsReceipt cannot be set in combination with ReceiptRq and/or ConfirmRq. Subfields ======================================================================== A short list of subfields and their IDs: DEST (0), ORIGIN (1), AUTHOR (2), ECHO (3), WHOTO (4), TRACE (5), CHARSET (6), SUBJECT (7), CREATOR (8), EXPORTER (9), SENTTO (10), MSGID (11), REPLYID (12), TEXT (1000), FILE (1001) Each subfield is an independent unit on itself. However, for the sake of easier producing of simpler and more readable EDX handling code, two major types of subfields are recognized, "simple" and "complex". The "simple" subfields are simply subfields that have a maximum lenght of 100 characters. They usually contain a stream of textual characters. Please note that if a simple subfield contains text, it is *not* null-terminated. Its length is to be determined by the "datlen" identifier in the subfield header. As said, the maximum length for simple subfields is 100 characters; all data beyond the 100th character can be ignored. Simple subfields have IDs ranging within 0..999. The "complex" subfields are all other subfields. Their maximum size and other attributes are specific for each of them. Their IDs range from 1000 on. Note: read what subfield descriptions say. If, for example, the Presence field says "exactly one", that means that *exactly one* subfield of this type should be inserted in the message, no more, no less. The same applies for other fields and as well to everything else in the document. SUBFIELD: DEST (simple) ID: 0 Presence: Either one DEST subfield or one ECHO subfield The DEST subfield stores the address of the system to route the message to. It is up to the systems that are passing the message to decide if and how to actually route the message there. For historical reasons, messages with a DEST subfield are called "netmail". Messages with an ECHO subfield are called "echomail". A netmail message is considered private between its authors and its addressees. SUBFIELD: ORIGIN (simple) ID: 1 Presence: Exactly one Contains: * the E-Address of the system that generated the message * a NULL character * the name of the person that wrote the message Gating: see Origin supplementary line. Also, as opposed to, for example, FidoNet, the gating system does not insert its own address in the ORIGINADDRESS subfield when a message is gated to EDX, but instead converts the original origination address to E-Address format and puts it here. The address of the gating system itself is stored as a part of a gated TRACE subfield. (See TRACE subfield) SUBFIELD: AUTHOR (simple) ID: 2 Presence: Zero or more Format of contents: * the E-Address of the system where the person can be reached * a NULL character * the name of the person Each AUTHOR subfield lists one of the message's authors if there are more than one or if the message's author is not the message's physical sender. All message's authors should be listed, any of them "residing" in the ORIGIN subfield or not. Gating to network formats that only support sender name (like QWK or OFT): use Author supplementary lines. SUBFIELD: ECHO (simple) ID: 3 Presence: Either an ECHO subfield or a DEST subfield The subfield specifies the name of the echo area to which the message has been posted. The contents of the ECHO subfield should be treated case insensitive. For the echo area name, all characters between from ASCII 33 to 126 are allowed, with the exception that '-', '+' and '%' must not be the first characters of the area name and that '*' and '?' must not be present at all. If there is no DEST or ECHO subfield in a message, the message should be shown to the sysop and its distribution among systems stopped. An echoed message is considered public. SUBFIELD: WHOTO (simple) ID: 4 Presence: Zero or more Each WHOTO subfield specifies a name of a person whose attention should be drawn to the message. The WHOTO subfield is, by its function, very much the same as To: lines in FidoNet and similar networks, except that EDX allows more than one message's addressee. (.. by allowing multiple WHOTO subfields to be present) If an WHOTO subfield is not present in a message with an ECHO subfield, the message should be assumed of equal importance to everybody. (Ie, the same as "To: All" in the analogy above) If no WHOTO subfield is present in a message with a DEST subfield, the message is assumed to be addressed to the operator of the system it is destined to. Gating to networks that don't support as many message addressees as the gated message has: use Whoto supplementary lines. SUBFIELD: TRACE (simple) ID: 5 Presence: Exactly one There are three formats for a TRACE subfield, "prevnet", "gated" and "native". The gated and prevnet formats are used only when converting a message from a parallel format to EDX and should not be used otherwise. The prevnet format reads: "<= <text-of-parallel-trace-information>" It is used to store TRACE information of the previous network. The gated format reads: "++ <time>, <site E-Address>, <progname>, from: <prev net fmt>" It is used to signify that a message has been gated from a network and is inserted by the gating program. See the native format for a description of the mentioned gated entry fields. Each EDX-compliant system, when exporting a message to other systems, must add its TRACE subfield of "native" type to the message, and it should do that so that all previously existant TRACE subfields are listed *before* the added TRACE subfield. This is essential: the order of TRACE subfields must always be kept when passing the message to other systems. No more than one native TRACE subfield may be appended. Also, prior to exporting a message, the native TRACE subfields should be checked upon the presence of our E-Address, and if positive (a TRACE subfield with our address is already present), the message should not be processed. An exception to this rule is only made if the native entry is the last in the list; in this case, the message should be forwarded to other systems, but another native entry should not be added to the TRACE subfield. If a system holds multiple addresses, only one of them should be written to the TRACE subfield, but all of them should be checked when checking if the message was already processed by the system. The format of the native TRACE subfield entry is: ".. <time>, <site E-Address>, <program id>" where ".." are indeed two periods (dots), <program id> should contain the name and version of the program that added the subfield entry and not exceed 25 characters, whereas <time> is the time when message was processed by the system whose site address is specified in <site E-Address>. Timestamp format is: YYMMDD HHmm sUUUU where: (all components of the timestamp are null-padded to their full length) YY is the last two digits of the year MM is the month DD is the day HH is the hour mm is the minute s is the sign for UUUU (either + or -) UUUU is the UTC offset of the system that generated the timestamp The YYYYMMDDHHmm part corresponds to the local time of the site. For example, 7th November 2007, 13:57, UTC offset 0200 positive: 071107 1357 +0200 Gating in general: the gating program should always add a "gated" TRACE subfield together with other TRACE subfields it created when gating the message. OFT gating: for ROUTE-ed (netmail) messages, the TRACE subfield is parallel to the Via kludge; when gated to OFT, the information from TRACE should be mirrored to Via, while when gated to EDX, the information from Via (without the "^Via: " prefix) should be mirrored to TRACE subfields using the prevnet entry format. If any mirrored Via line information is prefixed with "EDX<= ", "EDX++ " or "EDX.. ", the "EDX" pre-prefix should be removed and the "<= " prefix not added. For echoed messages, the TRACE subfield is not to be gated. Puzzled? Study the below example: <TRACE> .. 970101 1300 +1200, ofs->FidoNet#2:380/129.0, StupiToss v1.23 <TRACE> .. 970101 1330 +1200, ofs->FidoNet#2:380/100.0, SmarToss v2.34 Gated to OFT: ^AVia: EDX.. 970101 1300 +1200, ofs->FidoNet#2:380/129.0 StupiToss v1.23 ^AVia: EDX.. 970101 1330 +1200, ofs->FidoNet#2:380/100.0, SmarToss v2.34 ^AVia: FidoNet#2:345/678.0 SnailConvert Mon, 30 Feb 00 at 24:61 Gated back to EDX: <TRACE> .. 970101 1300 +1200, xyz.m-art.fido, StupiToss v1.23 <TRACE> .. 970101 1330 +1200, m-art.fido, SmarToss v2.34 <TRACE> <= FidoNet#2:345/678.0 SnailConvert Mon, 30 Feb 1999 at 24:61 <TRACE> ++ 970112 2001 +3456, ofs->FidoNet#3:456/789.0, WMail v3.45, from: OFT <...> Gating for networks with similar TRACE control: see OFT gating. Of course, if the destination network format supports TRACE information in echoed messages, it should be used. Converting to JAM: forget JAM's internal format and use the EDX's "international" format as described above, ie. "EDX.. <...>", "EDX<= <...>" and "EDX++ <...>". SUBFIELD: CHARSET (simple) ID: 6 Presence: Exactly one Contains the name of the character set that was used when writing the message if not LATIN-1. People of each country should settle on a few commonly-used character sets and their ID strings for the EDX CHARSET subfield; in Slovenia, for example, this subfield will usually contain "CP852", while for, say, the USA, it will probably always contain "CP437". SUBFIELD: SUBJECT (simple) ID: 7 Presence: Zero or one The SUBJECT subfield should contain a short description of what the message's text is about. When gating a message, if the subject is longer than what is supported by the destination network format, the Subject supplementary line should be used. (See next chapter) SUBFIELD: CREATOR (simple) ID: 8 Presence: Zero or one The subfield contains the name of the program with which the message was originally written. Should be omitted if the used program is the same that created the packet. The stated rule may or may not apply if the CREATOR and EXPORTER programs are different, but from the same package. Gating for network formats that do not feature anything parallel to the CREATOR subfield: use the Creator supplementary line. OFT Gating: when exporting to, use Creator supplementary line because of PID restrictions. However, when importing from, PID should be converted to CREATOR. SUBFIELD: EXPORTER (simple) ID: 9 Presence: Zero or one The subfield contains the name of the program that entered the message into network flow. Should be omitted if the used program is the same that created the packet. The stated rule may or may not apply if the CREATOR and EXPORTER programs are different, but from the same package. Gating for network formats that do not feature anything parallel to the EXPORTER subfield: use the Exporter supplementary line. OFT Gating: when exporting to, use Exporter supplementary line because of TID restrictions. However, when importing from, TID should be converted to EXPORTER. SUBFIELD: SENTTO (simple) ID: 10 Presence: Exactly one with ECHO subfields, none with DEST The SENTTO subfield contains from 1 to 25 ulongs. The SENTTO subfield is intended to provide means for implementations of fully connected poligons (networks or parts of networks where all participating systems send mail directly to all other systems). Each ulong in the SENTTO subfield should contain a 32-bit CRC of the E-Address of one of the systems to which the previous system in chain has exported the message in which the SENTTO subfield appears. The all-lower-case representation of the E-Address should be used when calculating the CRC. If a CRC of one system's E-Address is already included in the SENTTO field of a message, that message should not be sent to that system again. Each system should, when exporting a message to another system, create a *new* SENTTO subfield with CRCs of addresses of systems to which the system is sending the message now. The SENTTO subfield is mandatory in messages with one or more ECHO subfields, but should not be included in messages with DEST subfields. Gating: always removed when gated. SUBFIELD: MSGID (simple) ID: 11 Presence: Exactly one The MSGID subfield contains text that represents the string assigned to the message by the system it was sent from. When the MSGID has been created on an EDX-compliant system, its format should be: <hexno1><hexno2><hexno3> All of them are numbers in hexadecimal notation, the first two padded to 8 characters, the third padded to 4 characters in length, with no separator characters (whitespace, for example) to be inserted in between. <hexno1> is the 32-bit CRC of message text, the algorythm is the same as used in ZModem; <hexno2> contains a 32-bit sum of all characters in message text (that is, for i = 1 to textlen do value = value + character), first initialized to zero; <hexno3> contains the 16-bit CRC of message text, and the algorythm is the same as used in XModem. The MSGID should *never* be changed when the message is already being distributed. Note that at no point should this information serve as means to check if the message text has been passed ok; a processing application should always treat the MSGID field to be in an unknown format. However, the MSGID subfield is assumed not to contain unprintable characters, that is, it should always contain characters between and including ASCII 32..126. Gating: when converting to another message format, always use the MsgID subfield to store the message ID. However, the destination message's message ID field should, too, be set; when the contents of the MSGID field are longer than what is supported by the destination format or contain characters that should not be present there, a 32-bit CRC of the contents of the MSGID field is taken. If an origination address is needed, it is taken from the ORIGIN subfield. When a message is gated *from* another message format, it is first checked if the message contains a MsgID supplementary line; if so, the MSGID contents are taken from there. Otherwise, the contents of the origination message format's msgid field are taken. If the field is in binary, each of the bytes it consists of should be converted to a hexadecimal representation to produce a non-interrupted string of hexadecimal digits, say "1af262b577de" for some 6-byte binary number. If the origination address is a part of the origination message format's message ID field, its 32-bit CRC in hexadecimal should be appended to the already copied message ID without intervening data. SUBFIELD: REPLYID (simple) [don't take that too literally] ID: 12 Presence: Zero or one Contains the contents of the MSGID subfield of the message this message is a reply to; if the message is being converted from or to another message format, the same conversion techniques apply as for the MSGID subfield. This includes the usage of supplementary lines in cases similar to those described for the MSGID subfield; however, for the REPLYID subfield, not only a ReplyID, but also a ReplyAddr supplementary line is used. The reason will soon be obvious. Consider the following in an OFT message: ^AMSGID: 2:380/121.512 2ffbea7f ^AREPLY: 2:380/104.15 78024880 When converted to EDX, it would read simply <MSGID> 2:380/121.512 2ffbea7f <REPLYID> 2:380/104.15 78024880 But when converted back to OFT, the REPLY subfield could not be converted because the replied-to message's origination address is not available. For that matter, the contents of the replied-to message's MSGID subfield are followed by a NULL character and the origination address of the replied-to message. The full format of the REPLYID subfield, therefore, reads: <original message's ID> <NULL> <original message's origination E-Address> ========= Imagine the underlined "E-Address" string in block letters. Now, when a message is generated by an OFT system, it has the MSGID of, for example: ^AMSGID: 2:380/104.15 78024880 The string is then "converted" to EDX format, simply: <MSGID> 2:380/104.15 78024880 However, when the message is again converted to OFT format, the following message ID is created: ^AMSGID: ofs->FidoNet#2:380/104.15 <somenumber> <somenumber> contains the 32-bit CRC of the contents of the MSGID subfield that you can see 5 lines above. Of course, a MsgID supline is, too, prepended prior to the message text: &MsgID: 2:380/104.15 78024880 The reason that somewhere ofs->FidoNet#2:380/104.15 and somewhere just 2:380/104.15 is placed is that in the first case, the address was obtained from the ORIGINADDRESS subfield (that was converted to EDX format), while in the second case, the address is treated as a part of the original message ID. You should be able to explain that on each specific case. Later, a reply is generated by another OFT system that has the *.ID pair of, for example: ^AMSGID: 2:380/121.512 2ffbea7f ^AREPLY: 2:380/104.15 78024880 When converted to EDX, it reads: <MSGID> 2:380/121.512 2ffbea7f <REPLYID> 2:380/104.15 78024880 <NULL> ofs->FidoNet#2:380/104.15 Notice the original message's origination address after the REPLYID; it is retrieved from the first part of the ^AREPLY kludge in the message prior to its conversion. Now, when converted back to OFT: ^AMSGID: ofs->FidoNet#2:380/121.512 <sthelse> ^AREPLY: ofs->FidoNet#2:380/104.15 <somenumber> Here, <somenumber> is the same number as it was a few steps before when the original message's was converted back to OFT. This way, reply linking is possible even when messages get gated multiple times. Of course, along with the ^AREPLY and ^AMSGID kludges created in the last described step, MsgID and ReplyID supplementary lines are also added to message text: &MsgID: 2:380/121.512 2ffbea7f &ReplyID: 2:380/104.15 78024880 &ReplyAddr: ofs->FidoNet#2:380/104.15 SUBFIELD: TEXT (complex) ID: 1000 Contents: text Presence: Zero or one The TEXT subfield contains plain text. The smallest unit of text next to a character and a word is, however, not a line, but a paragraph that contains freely flowing text without intervening CR-s. A CR (ASCII 13) is used to terminate a paragraph and start a new one. ASCII 141 (softCR) is treated as a normal character. It is strongly recommended that, when displaying message text, lines of minimally 78 characters in length be supported. When inserting ASCII art in message text, this should ensure proper display of such messages on as many systems as possible. Message text is not to exceed 128k in length. However, implementations must be able to process all sizes of text up to that number of bytes. *Only actual message text* is allowed to be stored in the TEXT subfield. Although it is allowed to treat the tearline and originline as a part of message text when gating a message from OFT to EDX, it is not under any circumstances allowed for an EDX-compliant piece of software to actually generate any control information in the TEXT subfield. Such information has its place in other subfields; if there isn't any place for it to store, it shouldn't stored at all. SUBFIELD: FILE (complex) ID: 1001 Contents: Two ulongs followed by two null-terminated strings followed by unbounded data Presence: Zero or more Contains information about an enclosed file and the file itself. The first ulong contains the size of the file; it must match the number of bytes in the "unbounded data" field as said above. The second ulong contains the UTC date and time of file's last update, in Unix format - the number of seconds since 00:00:00, 01-Jan-1970. The first string contains the short 8.3 filename consisting of characters 'A'..'Z', '0'..'9', and "_-!#$&()", without the quotes; treated case insensitive. The second string contains the full name of the file; any character from ASCII 32..126, up to 255 characters. Should the full filename equal the short one, the third and the second strings should be set to the same values. The NULL that terminates the last of the above strings is immediately followed by the contents of the file. Gating for networks that don't feature files attached to messages: probably the best would be to move the uuencoded file's contents to the message text. Gating for networks that feature file attaches: save attached files to disk and attach them to the message. Use whatever format you wish to store other information about the file in the message's text. If the network format overwrites message's subject if files are attached, save the subject to message text using the Subj supline. Passing a message ======================================================================== When an application passes an EDX message it has received from somewhere to another system using the EDX format again, the only data it is allowed (*and* required) to change are the TRACE and SENTTO subfields. See the format of the two subfields for further information. Colors, fonts, inserted pictures, sound and whistles ======================================================================== EDX currently supports none of the above, the reason being that the number of complications all of the above would make highly exceeds its usability. If time proves the opposite, a special "FORMAT" subfield will be implemented that will dictate how to interpret message text, implementing all of the above and still staying backwardly compatible. Implementation of all this is relatively simple for message processors, while it complicates the message editor authors' lifes. I invite all authors of public mail editors to send me a message if they would like to implement GUI elements in their programs; if enough of us happens to gather up, we will produce specifications for the FORMAT subfield and a special msgbase format will be developed, most probably an extension to JAM (as it is the most flexible messagebase format present at the moment), to support this. ======================================================================== EDX message text supplements ======================================================================== Those EDX implementations that are expected to convert messages between EDX and some other format can make use of message text supplementary lines when a message's information would otherwise be lost in a non-EDX format. Note that EDX supplementary lines, however contradictory it may seem, are under no condition to be used in EDX, but in message formats that place control information in the message text and do not have (enough) space reserved for some information the message carried prior to being converted from EDX into that format. Also, for information for which there is sufficient space in the converted-to message format, no supplementary lines should be created; for example, there should be no Creator or Exporter supplementary lines in OFT Type-2 messages. Supplementary line format is, exactly: <" &"><linetag><": "><data><EOL> where: <linetag> is the tag of the supplementary line (case sensitive) <data> consists of ASCII characters 32-126 <EOL> is converted-to message format specific end-of-line terminator, for instance <CR> for FTS-1, <CRLF> for RFC-822 etc. A supplementary line must not exceed 79 characters. All supplementary lines are appended just prior to original message text. They are separated from it with an empty line, unless an empty line is impossible to insert in the converted-to message format. When a message with supplementary lines is converted (back) to EDX, the below-defined supplementary lines should be converted to their subfield representation. Unknown supplementary lines should be left untouched. Note that supplementary lines should be treated as a part of message text equal to the text itself; they are human readable, only their format is such that also a program can read them. Therefore, it is natural, for example, to store EDX supplementary lines after the SOT and before the EOT kludge in OFT messages. MsgID " &MsgID: <text><EOL>" Contains the contents of the MSGID subfield. (See MSGID subfield) ReplyID " &ReplyID: <text><EOL>" Contains the contents of the REPLYID subfield up to, but excluding the NULL character. (See REPLYID subfield) ReplyAddr " &ReplyAddr: <text><EOL>" Contains the contents of the REPLYID subfield from, but excluding the NULL character. (See REPLYID subfield) Creator " &Creator: <text><EOL>" Contains the contents of the CREATOR subfield should nothing equivalent be featured by the converted-to message format. Exporter " &Exporter: <text><EOL>" Contains the contents of the EXPORTER subfield should nothing equivalent be featured by the converted-to message format. Origin " &Origin: <name>, <E-Address><EOL>" Contains the name and address of the actual message sender if the converted-to message format cannot (safely) hold their entire name or address as it was originally. In OFT messages, the Origin supplementary line is always written. Dest " &Dest: <E-Address><EOL>" For netmail (and equivalent) messages only: contains the address of the system to route the message to if the converted-to message format cannot (safely) hold the entire address. In OFT netmail messages, this supplementary line is always written. Author " &Author: <name>, <E-Address><EOL>" Contains the name of one of the message's authors if the converted-to message format doesn't support anything parallel to the AUTHOR subfield. One line is used for each author, *all* authors should be specified. Whoto " &Whoto: <name><EOL>" Contains the name of one of the message's recipients if there are more than the converted-to message's format supports. One line is used for each recipient, *all* recipients should be specified. The Whoto line only applies to echoed messages; for netmail messages, multiple copies of the original message should be created. Subject " &Sbj: <text><EOL>" Contains the entire message's subject if there is not enough space for it in the converted-to message format. The following chapter is independent from the EDX specifications. It is a recommendation for an integrator between EDX and other specifications, and should, indeed, be placed in some file by itself. Don't worry, it'll be as it evolutes. ======================================================================== EDX Recommendations ======================================================================== The following are implementation recommandations intended to avoid chaos of different, non-inter-operable EDX implementations. In order to achieve that goal, each developer is highly encouraged to develop their software having them in mind. "ERX" is an abbreviation for "EDX Recommendations". ERX does *not* equal EDX; if one decides to implement EDX, they aren't bound to also follow the ERX specifications. However, for cases described herein, it is highly desireable that these specifications, too, be followed. In Fidonet, for example, common practice has been to separate the system into two major parts, the mailer and the tosser, where the mailer formally operates the level 3 layer and the tosser formally operates the level 4 layer. But in reality, the tasks are commonly mixed up; the program referred to as the mailer does things that belong to the fourth layer (call scheduling, for example), and still these functions are called the property of the mailer. Newer software, though, would use a different approach: there would be a single central system coordinating module (the "tosser") whose task would be to process mail and schedules, and that module would use the lower-laying modules ("mailers") to perform mail sessions. While these modules are kept in the same executable, there's no real problem exchanging data between them. But in reality, this cannot be the case for full-fledged packages; and, frequently, two modules used are not necessarily from the same author. The most practical way to exchange data between them is, then, through the underlying operating system's file system. The session module needs the following data be sent to it by the controlling module: * has the session been initiated locally or remotely * the protocols that should be taken in consideration when attempting to initialize the session, in descending order * the list of mail and requests to be sent to the remote Of course, the above short list contains nothing that could not be specified to the session handler on the command line when an outbound session is established. The problem is in the mail list when somebody else called in; with incomming sessions, there is no way to tell who it is that is attempting the connection before the session has already been established. That's why the session module should have some means to scan through the entire list of mail to be sent to all systems and pick out those destinated to the current partner-in-session. Recommended in-transit mail storage ======================================================================== As stated above, probably the optimal way to exchange data between modules is the underlying file system. When the mail is stored in files (mail for separated systems in separated files, of course), there are, basically, two ways of storing it: unchanged or changed. When it is unchanged, it is assumed that the file contains an arbitrary number of mail items; see below for a definition of such a format. The only reasonable way to change the mail packets, on the other hand, is to compress or encrypt them. Therefore, we need three types of files we would be able to tell from each other just by checking the filenames. Encrypted packets aren't covered in ERX. Adding mail packets to files containing unchanged mail is relatively easy. On the other hand, with compressed mail, one would have to unpack the file, add the mail packets, and recompress it; a relatively major pain in the ass. That's why compressed mail containers contain a variable number of uncompressed mail container files, which can then be quickly added another when necessary. ERX defines no standard mail compressing protocol; it is up to the implementation to scan the compressed mail container for a format ID and run the appropriate decompression module, be it ZIP, ARJ, LHA or whatever. For uncompressed mail containers, the naming convention is: <somethng>.ERX while for compressed mail containers, it is: <somethng>.EC<n> <somethng> consists of exactly 8 *hex* digits. ('0'..'9', 'A'..'F') <n> is a number in base 36 (0..9, A..Z) - described a paragraph or two below. The names are case insensitive. Naming algorythm: the contents of <somethng> generally don't matter, but however, for compressed mail containers, an optimal algorythm would compute the 32-bit CRC of our address and the address of the system the file is destinated to, while for uncompressed containers, the algorythm would be simply to make <somethng> a number that is incremented each time a new uncompressed mail container is created for a specific system. <n> comes to use when a parallel task wants to store new compressed mail for a system when that particular system is just on-line and receiving its mail; then, a new compressed file is created with a higher value of <n>. Note that there is a catch with processing received mail. No one guarranties that two uncompresed mail containers from two separate systems will have a different name. Therefore, when raw uncompressed mail containers are received, care should be taken to rename them in the event of a name clash, and when compressed mail containers are received, only one at a time should be unpacked and processed. Also note that the names of files when received on the destination system need not match the filenames as they were on the origination system. In the event of name clash, implementations are allowed, indeed, expected to rename the files as appropriate. Format of the above mentioned uncompressed mail containers ======================================================================== An uncompressed mail container (a packet) consists of a binary header and an arbitrary number of mail items; for now, EDX messages. For the sake of upgradeability, each item is preceded with a 4-byte unsigned long integer representing its length in bytes and a 4-byte unsigned long integer representing the type of the item in order to allow implementations of lower EDX levels to skip items they do not know about in the possible future. Uncompressed mail containers are protected using envelopes that optionally include password protection. An envelope is a 32-bit value that is used to check packet's authenticity. For non-password-protected packets, the envelope is simply the 32-bit CRC of all data beyond packet header. For password-protected packets, however, the procedure is a bit longer. In the latter case, the first part of computing a packet's envelope is to generate the packet's key: a 32-bit CRC, a 32-bit checksum and a 16-bit CRC of all data beyond the packet's header are computed. The checksum is a 32-bit value that represents the sum of all bytes the mentioned data consists of. The 32-bit CRC is the one used in ZModem, the 16-bit CRC is the one used in XModem. When the three values are computed, they are copied into an array of 10 bytes that represents the packet's key, first the 32-bit CRC (4 bytes), then the checksum (ditto), then the 16-bit CRC (2 bytes). Then, the packet's password is encrypted with the resulting key, and the 32-bit CRC of the resulting encrypted password is the packet's envelope. The encryption algorythm is: newdata[i] = origdata[i] * thekey[(i MOD sizeof(thekey))] The arrays newbyte, origbyte and thekey are assumed zero-based. The newdata and origdata arrays are assumed to have the same size. The i variable is assumed to have the range of 0..[origdatalength-1]. If there is no data following the packet's header, the envelope should be set to -1 (0xffffffff). The packet's envelope is checked by computing a separate version of the envelope and comparing it to the envelope that is stored in the packet's header. Header structure: char signature[4] // 'E', 'R', 'X', ASCII 0 ulong hdrsize // Size of the packet's header in bytes ulong envelope // Packet's envelope, see above char origaddress[101] // Null-terminated origin E-Address char destaddress[101] // Null-terminated dest E-Address char creatorprog[51] // The program that created the *packet* The size of the packet header may increase in future ERX levels higher than 1. However, future packet headers will stay compatible with ERX level 1; an ERX implementation is, when implementing packets as described in here, expected to be able to process all revisions of the packet header with the help of information stored in hdrsize. The signature field *must* match 'E', 'R', 'X', NULL in order for the packet to be processed. The comparison of 'E', 'R' and 'X' should be performed exactly - case sensitively. The origaddress and destaddress fields specify the origination and destination addresses of the packet, *not* the messages in it. Since the ERX packet is a temporary structure created and known only between two directly connected systems and is not to be routed, a destaddress would normally not be needed, but is present if the packet ends on a different system from the one it was destined to. The creatorprog field contains a banner for the program that created the *packet* (not the messages in it), say: "MailMangle v1.24 build376". Only characters in the range of ASCII 32..126 are allowed. The header is followed by zero or more items (for now messages only), each preceded with the following structure: ulong itemtype ulong itemlen itemtype 0 stands for a message; no other values have been defined as of yet. If an unknown itemtype field is encountered, the item should be skipped. Coexistance of ERX packets with Old FidoNet Technology Type-2+ packets ======================================================================== When an ERX implementation sends mail to a system using OFT Type-2+ packets, it should signify the availability of ERX by setting the Capability Word as if it would support Type-16 packets; that is, the 14th bit of CW (starting from 0 = 0x01) is set to 1. The CWValidation field should, of course, be set accordingly to the generated CW. Coexistance of ERX packets with Old FidoNet Technology Type-2 packets ======================================================================== When sending mail to a system using OFT Type-2 packets as described in FTS-1 r15, a Type-2+ header is generated instead of a raw Type-2 one, and the CW and CWV fields are used as described above. Recommended mail list format ======================================================================== The recommended mail list format consists of the main data file and the index file, named <basename>.MLD and <basename>.MLX, respectively. <basename> should be user-definable. Note that the mail list base is not used as an in-between between a tosser and a mailer as used in, for example, FidoNet, but between the program that already established a mail connection and the program that is actually performing mail transfer. Therefore, the use of this type of mail list has no meaning with traditional mailers like FrontDoor or BinkleyTerm. All applications should open the mail list files in shareable (DENYNONE) read/write or readonly mode. An exception is granted to maintenance utilities, which should open the mail list files in exclusive, DENYALL sharing mode. If a normal application (ie, not a maintenance utility) attempts to write to the mail list, it must first attempt to lock the first byte of the data file. The application should under no circumstances attempt to write to the mail list if it could not lock the data file. The program should, after successfully locking the mail list, write what it has to write as quickly as possible and then release the lock. The data file ============= The data file consists of a 1024-byte binary header, followed by subfields of base type, each listing a file or a request and its destination. The binary header format is: char hrsig[30] The hrsig field contains the following human readable signature: "Mail list data file (binary)", followed by #26 (^Z), followed by the terminating null. The rest of the data file is built of base subfields. Subfield IDs are somewhat peculiar: they are used to hold special attributes of the mail item. The first 16 bits (0..15) are used as a part of the normal ID, while the other 16 (16..31) have special meanings that depend on the type of subfield. This all is equivalent to splitting the subfield ID in half and naming the second half "subfield attributes" instead. The unused attribute bits should be set to zero when writing the field to the file. The maximum length of any given subfield is 512 bytes. SUBFIELD: FILE ID, low 16 bits: 0 ID, high 16 bits: 31: If set the file should only be sent on inbound connections with the specified system. If not set, the file should be sent on outbound connections as well. 30: If set, the file contains ERX mail, no matter what its filename. 29: If set, the file contains OFT mail, no matter what its filename. If bit 30 is set, bit 29 should be ignored. If neither bit 30 nor 29 are set, the file is assumed normal. 28: If set in combination with 30 or 29, the mail is stored in raw, unmodified (compressed, encrypted) packets; otherwise ignored. 27: If set, the file should be deleted after it is sent in its entirety. 26: If set, file's size should be set to zero after it is sent in its entirety. If bit 27 is set, too, this bit should be ignored and bit 27 honored. Contents: Two null-terminated strings The first string specifies the E-Address of the system the subfield applies to, while the second string specifies a file to be transfered to that system. Exactly one file per subfield can be specified. The maximum length of the first string is 100 characters. The maximum length of the second string is 255 characters. SUBFIELD: REQUEST ID, low 16 bits: 1 ID, high 16 bits: undefined. (Zeroed) Contents: two or three null-terminated strings The first string specifies the E-Address of the system the subfield applies to, while the second string specifies the filename to request from the remote system. Wildcards are allowed. Should a password be required, it should be specified in the third string. If the second strings contains a full path and filename, it is to be treated as an update request. Exactly one request per subfield can be specified. The maximum length for the first and third (optional) string is 100 characters. The maximum length for the second string is 255 characters. SUBFIELD: TEST ID, low 16 bits: 65535 ID, high 16 bits: depends on implementation Contents: A null-terminated string and undefined data Intented for various experiments, this subfield contains one null-terminated string specifying the program that is making the experiments, followed by that program specific data. When another program's (= unknown) TEST subfield is encountered, it should be ignored. The index file ============== The mail list index file is built of a binary header and an arbitrary number of binary records. The binary header format is: char hrsig[31] The hrsig field contains the following human readable signature: "Mail list index file (binary)", followed by #26 (^Z), followed by the terminating null. Each binary record corresponds to a subfield in the mail list data file. The binary record format is: ulong addresscrc ulong subfpos ulong subfid Where addresscrc specifies the 32-bit CRC of the E-Address used by the subfield; equals -1 if the subfield is not of a type that would contain a single E-Address (no such mail list data file subfield is defined as of yet), subfpos specifies the absolute position of the subfield in the data file and subfid specifies the ID of the subfield. Subfield deleting ================= When a subfield is processed (ie, a file is sent or a request is made), it should be deleted. Since it would be rather awkward to actually delete the subfield, it is done so that all the fields of the respective subfield's index record are set to -1 and the subfield's ID in the data file is set to -1. Actual physical deletion of subfields is left to some sort of a packing program as used by similar data bases. Recommended logical connection layer standard ======================================================================== I stick to the rule that any network layer should cooperate with as many other different network layers as possible, and that's why I leave it to the network that is about to implement EDX to decide which first, second and third layers to use. FTN networks will probably want to stay with (or upgrade to) EMSI and Hydra. ======================================================================== Evolution considerations ======================================================================== As EDX evolutes, care will be taken for each higher level of EDX to be a superset of the prior versions, so that a higher-level program will be able to process lower-level EDX packets without even having to know that they are from a level lower than the highest supported. Also, a lower-level program will be able to process higher-level EDX packets as long as it ignores unknown subfields and subfields; also, in binary or string structures, it should ignore all extra data out of the known structures, so that lower-level software won't choke on a packet if a new substring is added to, say, string 2 of the SYSINFO subfield. However, a lot of information would be lost with such superset-to- subset conversions; therefore, a received mail packet should (=must) be passed to downlinks with all locally unknown information included, with only the known fields updated, if necessary. I still do, of course, strongly encourage everyone, especially sites with many direct or indirect downlinks, to use as recent software as possible. ======================================================================== Considerations on upgrading from and coexisting with other mail formats ======================================================================== Mail format coexistance is often required for a big network to be able to upgrade itself smoothly to a new and better mailing technology. Generally, the implementation is such that when sending mail to another system, the application puts somewhere a sign about other mail formats it supports; this sign is, naturally, not defined in the specifications of the mail format the sign is in, but rather in the specifications of the mail format the sign stands for. If the destination system also supports the "signed" mail format, it uses it next time when sending mail to the system that sent the sign. When that system receives mail in the new format, it too switches to it next time it sends mail back. Note that, when converting a message to EDX format, each piece of information should only be converted if and only if: a) An official supplement has been added to EDX explaining how and if that information should be stored in EDX or b) EDX already has space defined to store that information; for example, the contents of the OFT MSGSEQ kludge should be stored in the seqno header field. or c) The official standard describing that information contains instructions how to store the information in EDX messages. The exact instructions as present in either of the above cases should be followed. Instructions in an official EDX supplement have precedence over original EDX specifications, while the original EDX specifications have precedence over instructions made by a third party as described in the third case. That means that if someone invents a great new WhizBang mail format and says that message sequential number information should be stored in an additional subfield, that information should regardless of that be stored in the appropriate field in the message header. If none of the above described cases in which information can be stored in an EDX message applies, please contact me - either privately, through E-Mail or snailmail, or through the FidoNet Net_Dev echo. All proposals are welcome. /// EOF */